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ABSTRACT 


Based on recent , continuing advances in semiconductor 
technology, classical N-Modular Redundancy appears to be a 
viable approach for designing ultra-safe flight control systems 
in the near future. 

RAMP consists of distributed sets of parallel computers 
partitioned on the basis of software and packaging constraints. 

To minimize ha.rdware and software complexity, the 
processors operate asynchronously. It is sho'V7i that through the 
design of asymptotically stable control laws, data errors due to 
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the asynchronism can be minimiaed. It is further shown that 
by designing control laws with this property and making minor 
hardware modifications to the RAMP modules, the system 
becomes inherently tolerant to intermittent faults. 

A laboratory version of RAMP has boon constructed and 
is described in the paper along with the experimental results 


obtained to date. 


1 . 


INTRODUCTION 







The authors arc cun ently engaged in an avionics system s 
research prograrr seeking design methodologies for realizing 
future high authority autoflight control systems. This effort is 
broad in scope and includes control law development, experimenta- 
tion Ns'itli sensors and actuators, cU^d the investigation of distri- 
buted microcomputer architectxires. The latter, in particular 
the redundant asynchronous nucroprocossoi (RAMP) structure 
is cun ently being investigated and is tlie subject of this paper. 
Before describing RAMP in detail, it is xiseful to first describe 
the general struct*are and the key elements that collectively 
make it a new and different approach in implementing digital 
avionics systemis. Tliis is done in the following section. . 
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RAMP “ AN OVERVIEW 


» 


Uj. 


Given the requirement for increased aircraft operation ai 
complexities by the 1990 's (real time air traffic control* fuel 
mimn-uzationt autoland, etc.) it is clear that the digital computer 
\«dll become the dominant component in avionics systems 
implementations. This need for tliese more complex operations 
is at first sight an invitation for more complex avionics systems. 

It has been the autliors' core philosophy to research 
concepts and methodologies that maximize digital avionics 
system simplicity v^ithout ccmpromisc of future operational 
requirements. 

One result of such work is the compact Total Automatic 
Flight Control System (TAFCOs) algorithm (described in 
References 1 zind 2) applicable to liighly non-linear systems. 

This same thinking has led the autliors to depart from 
the conventional (c.g. data processing industry) approach of 
employing dir.ital systems in an avionics cn\'ii*onmcnt and to 
instead investigate RAJMP. Before discussing the extent of 
this departure, it is useful to di'scribc the overall approach 
being taken as follows. 


2-1 


As seen in Figure 1* RAMP comprises a connected 
network of microcomputers which has as input command and 
sensor information and which generates servo information to 
drive actuators, thrust linkages, etc. Each microcomputer 
in the network moreover performs a specific, well-defined 
input/output function such as sensor data conversion and 
preprocessing, execution of flight control algorithms, servo- 
positioning, etc. Although consisting of digital sysccm 
elements, the network is intentionally structured and operated 
as a conventional analog control system: sensor data are input 

and serv^o data are output as a function of command mode inputs. 

Correspondingly, the individual microcomputers "appear” 
as analog-like elements each having a predetermined set of 
(command) selectable input/output characteristics. The micro- 
computers at each node are moreover autonomous: 

1) Each has its own clock, i. e., the network has no 
master or central timing reference ^^ith the result 
that the microcomputers in the network arc not 
synchronized. 

E) The computers are electrically isolated to the 
extent that hardware failures do not propogatc 
from a given, failed microcomputer. 




3) Interconnect between microcomputers is confined to 
data traffic such thit a given microcomputer simply 
"broadcasts" messages to one or more of the other 
}Tucrocomputers in the network, the received 
messages being aufhomatically biiffered and used by 
the receiving microcomputers. 

Finally, to insure flight safety in the event of hardware 
failures, redxmdant microcomputers are employed at each node 
as depicted in Figure 2 such that failures in a transmitting 
microcomputer (or the associated data transmission path) will 
be recognized by a receiving microcomputer in that former's 
data will be disparate ■with that of the remaining "good" micro- 
computers. Hence the receiving computer (which normally 
will also be replicated) can select the correct data. 

Now this approach differs from the conventional, data 
processing approach (c.g. References 3 and 4) to digital systems 
implementation in several respects; 

1) The RAMP network and modules are structured 

to perform a limited range of specific, analog-like 
functions; this is to be contrasted with the use of 
a network of general purpose computers. 


2-3 


2) The RAMP microcomputer modules are autonomous; 
the system does not employ an operating system* 
global executive software or complex intermodule 
communications software. 

3) The modules are not synchronized; i.e.* central 
timing hardware and software arc not employed. 

4) Finally, tolerance to hardware failures is 
achieved by static redundancy (Reference 5), i.e. 
results of a failed irucrocomputer are simply 
rejected; tliis is done in lieu of dynamic redundancy 
(Reference 5) wherein the distributed computer 
system performs real time fault detection and re- 
configuration of the system. 

These foregoing differences are summarized in the 
table which follows. 
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TABLE 2-lt RAMP VERSUS CONVENTIONAL IIGITAL SYSTEMS 
IMPLEMENTATION 


RAMP 


• Fixed function! 
Analog -like 
Implementation 


• No operating system or 
Globrl Executive. 

Limited module -to -module 
communication software 


• Asynchronous 

• Static Redundancy 


• Low Complexity 


CONVENTIONAL 


• General purpose 
Data processing 
Implementation 

• Centralized Operating 
System! Global Executive 
Software! Intermodule 
Communication 
Software 


• Synchronized 

• Dynamic Redundancy- 
Requires Real Time 
Fault Identification 
and Reconfiguration 


• High Convplexity 
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m. THE PAMP NETWORK STRUCTURE AND OPERATION 


HKc previous section hiis described the general architecture 
employed in RA^4P. 

Figure 3 shows a specific network structure based on 
the RAMP concept which is being investigated by the authors in 
the current flight research program. As shown* computers arc 
distributed into five sets of triplets (e.g. N = 3). 

There are three reasons for partitioning the computation 
given in their or der of importance as follows: 

1} Figure 3 shows that the system computation process 

divides into three groups: sensor /command processing* 

the control algorithm and servo processing. In the 
airborne application, each of these groups would be 
expected to physically reside in different parts of 
the aircraft system. (I.e. the partitioning into 
disti' ct comptitational sites is actually governe d by 
packaging and cabling constraints.) 

2) As will be discussed in more detail in Section V, 
the reliance on static redundancy for flight safety 
presumes that each computer be fully self-tested 
before flight. The distributing of the computation 
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load into several, smaller microcomputers facilitates 
reasonably short yet comprehensive preflight tests. 
3) The final motivating factor in ^stributing the comp- 


utation is that hardware boxindarics can be set on 
the system softv are. E.g.» as depicted in Figure 3 
software is partitioned into correspnding hardware 
modules with tlic result that software can be con- 
currently developed and, more important, modified 
without effecting the remainder of the software in the 
system. 

The network operates in the following manner: 

1) Microcomputers in the set MjjJ input data from a triply 
redundant set of sensors and redxindant (crew 
generated) commands, the latter consisting of 
trajectory waypoints and flight modes. The micro- 
computers process the senscr data (correct values 
being selected from the redundant inputs) and 
derive estimates of the aircraft states. Corres- 
pondingly, correct waypoints and mode commands 
are selected. (The exact mechanics of this 

It is also possible in this typ.' of application to employ an 
analytically redundant sensor mix as descri bed in References 
6 and 7. 
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•eloction procefts are discussed in the next section.) 
Each microcomputer in the let transmits 

results to all microcomputers in the set ‘ 

Z) As noted, each microcomputer in the set '^^2j ^ 

has the resxilts of computations made by all the micro* 
computers in the set . From this set of 

inputs, each microcomputer in selects 

by voting, the correct state estimates, command 
modes and commanded waypoints and computes the 
trajectory to be flowi by the aircraft. These results 
arc then transmitted to microcomput ers 

3) Tliis process continues in a similar manner, micro- 
computers ^ selecting results, computing the 

control (using the commanded and actual traj ectory 
states) and transmitting these to [^ 4 j^ *nd so on. 

4) The final set of nucr ©computers |^ 5 j 3 drive 
rcdxmdant actuators. (A current trend in ultra- safe 
actuator research is nut to use redundant actuators 
but instead single actuators with redundant hydraulic 
valving systems. Those devices operate on redundant 
electrical inputs. See for example Reference 8.) 

It is seen in this description that information flows from 
left to right in the figure. Each microcomputer moreover 
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■ ..... j 


operates recursively in what will be referred to sxibsequently 
as computation "frames". In the current experiments, the 
period T of these frames is constant (T = 50 ms. in the present 
laboratory version of RAMP.) (Section Vm discusses this 
aspect of timing in more detail. ) As a result, the computation 
process of the system as a whole is a combination of parallel 
processing (e.g. the microcomputer set ) and pipeline 

processing (Figure 4). 

A key feature of the RAMP structure as depicted in 
Figure 3 is that each microcomputer in the network is designed 
to be autonomous. This autonomy is achieved by: 

1) Letting each microcomputer have its own clock 
such that as a whole, the system is not dependent 
upon a central timing reference. 

2) Having each microcomputer employ a set of buffer 
memories that are independently written by other 
microcomputers in the network. The configuration 

of these memories and the interconnecting communica 
tion paths are depicted in Figure 5. The essential 
features of this detailed structure are: 
a) Each microcomputer in the set writes 

data into all microcomputers in the set 
"f Ml. .1 . Data are simply "forced" into 

these memories during the former's computation 
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frame. (I.e. there is no "handshaking" or 
other coordinating of the data transfer between 
microcomputers. ) 

b) Each microcomputer in the set j ^ 

fetches data from the buffer memories when 
needed during its computation frame. (The 
means of avoiding read /write conflicts is 
discussed in Section VIII.) 

c) Electrical isolation (e.g. high impedance and/or 
optois elation) is employed as shown between 
computers. 

The purpose of providing this autonomy is to prevent 
propagation of hardware failures in any given microcomputer 
or in any given transmission path to the other hardware elements 
in the system. As a result, for the system of Figure 3, up to 
one microcomputer in each triplet can undergo a hardware failure 
without affecting the hardware integrity of the remaining pair 
of microcomputers in tlic triplet. 

Now the handling of s\ich failures has bt;en somewhat 
vaguely referred to in the foregoing (e.g. selection of "correct" 
data, "voting", etc.). Thi.s topic is however central to the RAMP 
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concept and discussed at length in the remainder of the paper. 
Before embarking upon this however, it is important to note that 
the employment of asynchronous microcomputer modules places 
an important constraint on the design of the control laws imple- 
mented in RAMP. This is discussed in the next Section. 
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IV. FUGHT CONTROL WITH PARALLEL. ASYNCHRONOUS 
CO^tPUTERS 


A characterising feature of RAMP is that each micro- 
computer in the network has its own clock. Hence as a result 
of variations in (oscillator) components from microcomputer to 
microcompv.tcr, operation of the system as a whole is asynchronous. 

It has already been stated that this asynchronism places 
an important constraint on the design of the control laws hosted 
by the network. This is discussed in the following. 

With little sacrifice of generality, consider a set of 
parallel computers employed in the control of a plant as 
illustrated in Figure 6. In the current control law work, aircraft 
control u is obtained from the plant states y using the lollowing 
recurrence equation; 

u(i + 1)T = A u (iT) + By (i + 1)T (4-1) 

where T is the period of a comptitation frame (See Section III and 
Figure 4). Referring back to Figure 6, it will be assumed that 
only computer is selected for control of the plant and that 
its computation frame has period T. Correspondingly, it will 
be assumed that the remaining computers have different and 
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vmcqual computation frame periods such that only computer 
is selected. (Some generality is lost here since the select 
process is a function of the output of all the computers. What 
follows therefore is a necessary conAtion for the properties 
of Equation 4-1). 

Next consider the case in which computer C has a 
shorter computation frame period by an amount > T such that 
after execution of n computation frames by computer C^, 
computer Cj has executed exactly (n + 1) computation frames, 
i. e. , 

n S T = T (4-2) 

Looked at another way, computer Cj would execute exactly 
one more computation frame than every nT seconds. 

It is shown in Appendix A that due to the timing error, 
computer C vill generate an error having the following 

recurrence relation: 

u(j + l)nT = A"+^ u (jnT) 

+ A” (A - I ) vi(jnT) + A"By(jnT) (4-3) 

It is shown (also in Appendix A) that to guarantee con- 
vergence of tliis error, it is necessary that the control law being 
computed (by all the computers) be asymptotically stable. 
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(This result can in fact be obtained in a more general 
manner by arguing that the outputs of the deselected computers 
are xmcontrollablc such that to insure convergence of the error, 
the control law being executed must be asymptotically stable. ) 

The effects of the timing error cai be illustrated by 
considering the example of Figure 7 which depicts a simple 
second order system employing a stable, metastablc and unstable 
control as shoNvn. The system response (to a unit impulse) 
is the same for each controller and is shown in Figure 8a. 

Figure 8b shows time histories of the errors that would exist 
in a deselected computer having a 10% timing error (i.e., 

Jil- = 0.1.). 

T 

To generalize these examples, it is clear that by 
employing an unstable or metastablc control algorithm, a 
deselected nucrocomputer can, as a result of timing errors alone, 
accnmulate excessive data errors or possibly be incapacitated 
(e.g. as a result of overflows). 

Conseqiicntly, in RAMP (niore specifically in the TAFCOs 
algorithm discussed briefly in S^'ction II), the control laws are 
designed to be asymptotically stable. This design policy, which 

In the practical application such errors are typically 
much smaller, e.g., .01% to .001%. 
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permits control of data errors due to timingi has an equally 
important role in RAMP's tolerance to intermittent faults. 

Before discussing this aspect, the subject of faults and 
fault tolerance is first explored in the following sections. 
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V. FAULT TOLERANCE. MASKING AND IDENTLHCATION 

In the previous section) the RAMP concept has been 
illustrated using triplicated computer sets such that each computer 
(with the exception of those interfaced to the sensor inputs) has 
three buffer memories. 

In terms of the general concept of RAMP (Section II) 
each computer will have N (= 2n + 1) memories containing data 
generated from N redundant computers. (In general, more 
than one redundant set may input to a given computer in wliich 
case there will be as many mrmorics as there are. computers. 

This case becomes obviously included in the discussion that 
follows . ) 

Each of the N memories will in turn contain a total of 
K (real number) data values placed there by the corresponding 
transmitting computer. Tliis is illustrated in Figure 9 w’here 
for example corresponds to the jth data value in the kth 

buffer memory. 

Now where there arc no timing errors (i.c. differing 
clock rates in the transmitting computers) or faults ( in the 
transmitting computers, data transmission paths, ajid/or the 
buffer memories), the data in a given row of Figure 9 will be 


i 


( 
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identical. However, where tiirdng errors or faults exist, 
these data values can diifct. 

The basic approach in RAMP is to use these data in pro- 
viding fault tolerant performance of the system. Before discussing 
the specific strategies employed by RAMP, it is necessary to 
consider the nature of the faults themselves. 

First, it is assumed that the faults resxilt from random 
hardware failures. I.e. , common mode or "generic" sources 
of failure arising from design mistakes, external effects due to 
heating or EML, fabrication mistakes, etc. have all been accounted 
for in the system design and development process. 

Second, the faults being considered may be either permanent 
or intermittent (Reference 5). (Intermittent faults arc a crucial 
issue in the "real world" implementation of systems such as 
RAMP and are discussed in Section VII.) 

Third, it is assumed that the I'aru'vare faults experienced 
in a given module are confined to the r<odulo and do not propagate 
(i.e. the microcomputer modules arc fully .*aconomous). Note 
that this latter as.sumption is readily confirmed in practice by 
simply enumerating the input/output hardware failure modes for 

*For the electronic components of the type employed in micro- 
electronic systems such as RAMP, a constant failure rate model 
is employed. Sec for example, Reference 9- 
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each microcomputer module and verifying that none effects the 
function of the remaining microcomputers in the network. 


Given these types of faults i RAMP employs two basic 
methods for realizing fault tolerant performance: 

1) Mid-Value Select 

Due to variations in the clock rates in the individual 
microcomputers the data values corresponding to the 
rows of Figure 9 will include errors due to timing. 
I.e. , the data values will be dispersed along the 
real line as shown in the example of Figure 10. 

Tlic basic strategy employed by all the micro- 
computers in RAMP is to select, as the correct 

data value, the mid-value of these dispersed operands 

if 

(c.g. D in FigurelO). More sharply stated, given 
the set of N ( = Zn + 1) distinct values (some subsets 
of which may be equal) in a given row of Figure 9, 
a value will be selected such that there are exactly 
n distinct values greater than or equal to it and 
exactly n distinct values less than or equal to it. 

The postulate here is that this mid-value will 
approximate, with a kno%vn error, the correct value 

❖ 

Mid-values for the sensor and command values are 
selected as well. 


in spite of any kind of failure in up to n of tiie 
transmitting microcomputers. (A detailed argument 
behind this is reviewed in Appendix B. ) 

The net result is that for a given set of data 
values, only one value is selected and 2n are 
ignored. I.e., the selection process "masks" the 
results of any failed units and the remainder of the 
"good" vinits. 

2) In-Flight Fault Identification 

The fundamental strategy belxind RAMP is to 
rely on the known reliability of redundant hardware 
to safely execute flight -critical operations. Specifically 
RAMP does not employ any software other than the 
mid-value select to compensate for faults experienced 
in flight. 

Faults experienced during flight arc however 
identified but for two different purposes: 

a) To alert the flight crew that failurc(s) has 
occurred. Here, the decision to continue or 
to abort flight critical operations with the 
degraded system is a crew decision. 

b) To provide a flag and record for system 
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maintenance of both permanent and intermittent 
faults experienced during the flight operations. 

The basic approach to in-flight fault identification 
is simple threshold detection which is described as 
follows . 

Referring back to Figure 10, it has already been 
noted that some of the data values will be dispersed 
as a result of timing errors in the computer clocks. 
Based on anr lysis and flight test results, an expected 
maximum range can bo determined for the dispersion 
of these data. Correspondingly, ma»mum ranges 
can be established for each redundant sensor and 
redundant command inputs. During flight, the actual 
range of each data value is determined and compared 
to the corresponding, predetermined ma:dmum 
range. Where the actual range exceeds that maximum, 
the flight crew and maintenance recorder are signalled. 

To this point in the discussion, little has been 
said about system reliability which is of course the 
major concern with RAMP. Reliability modelling and 

Digital command inputs must of course be identical. 
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techniques for deriving reliability estimates arc well 
covered in the literature (e.g. References 9 and 10) 
and will not be discussed in this paper with one 
exception. This is the fact that to estimate the 
reliability of RAMP for flights of given duration 
(e.g. ten hours maximum) it is necessary to 
determine, at the outset of flight, the existence and 
nature of any faults residing in the system. 

As a result, the RAMP system must be tested 
prior to flight. This is discussed in the next 


Section. 


RAMP PREFUGHT TEST 


VI. 


It ha* been pointed out in the previou* section that in 
o“der to derive valid estimates for the in-flight reliability of 
« given implementation of RAMP, a r^eflight test is required. 
Ideally, one seeks to design a test that will verify that no 
faults exist. Moreover, the time required to perform th^ test 
must be short (in the RAMP application the preflight test 
time should be under ten minutes ). Now it is well Ijiown in 
digital systems practice that the problem of fully testing LSI 
Components of the type employed in RAhiP is intractable to the 
extent that the required test times have astronomical dimensions. 
Instead, one settles for testing to a given "coverage", d<rfined 
as a ^sjrcentage of all possible faults that can be vincovcred with 
a given tost method. (See for exan'jplc Reference 11). 

Before proceeding with the discussion of tliis eubje't it 
is appropriate to point out that the problem of designing tests 
to a knouTi coverage and more important prcAnng that such 
coverage has be<‘n obtained is at present an open research 
question. This applies not only to RAMP but in fact to LSl-bast-d 
systems at large. What follows therefore is a di*scription of 
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the authors' approach to this problem in the context of designing 
and verifying the preflight test for RAMP. 

Before discussing the approach to fa\ilt testing, it is 
necessary to look more closely at the nature of the faults 
themselves. Recalling that the microcomputer modules in RAMP 
do not propagate their failures, concern is confined to the ways 
a given microcomputer module can fail. 

To do this, a "top-down" or fault tree (Reference 9 ) 
approach is currently being investigated. This approach is 
explained by the following example. 

Consider a single microcomputer module executing the 
control algorithm given by Equation 6-1. This is illustrated 
in Figure Ha, where the control algorithm consists of the 
repetitive execution of the arithmetic replacement statement, 

u = Au + By. (6-1) 

It has already been pointed out that each RAlvlP module operates 
recursively, inputting data, performing mid-value selection, 
performing threshold detection (to flag inflight errors), 
executing the control algorithm, and outputting results. This 
process is illustrated by the program shown in Figure lib in 
which all the foregoing functions wth the exception of the control 
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algorithm are performed by procedure (i.o. subroutine) calls. 
The HMKOUT procedure is invoked as a part of this code in 
order to establish the compiitation frame hmc T (Section II). 
Hence, the program of Figure Hb is emilesaly executed every 
T seconds. 

To begin tlie discussion of faults from a "top-dowii" 
viewpoint note first that the microconiputer module executing 
the algorithm of Figure 11 has just one failure mode: it will 
not generate a correct y(t) for some u (t). It is the postulate 
that in RAMI^ modules this single f^iilurc mode c.an arise from 
only three kinds of fitults: 

1) Faults that catise data alteration . 

2) F.avilts tltat re.sult in inu^ roper t>xe<u»Mon of the 
de .signed code . 

3) F.udt.s tliat cause excessive timing e rrors . 

Consid^•r^ng the first of the.se, it i.s further postulated 

that under the .assumption that th<- code is being correctly 
exeoited, tlie microconn>uter mod\de tlata c.in be .titered only 
as a result of one or im»re of Uie following: 

a) Faults in th«- path(s) connecting Uie input dat.t 
y(t) to the CPU (i.e. accumulator) 
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b) Faults in the paih(s) connecting the CPU (i.e. 
accumulator) to the outp ut of the module. 

c) Faults in the constant data (c.g. A and B 
in Figxirc H) store. 

d) Faults in the rcad/writc data (c.g. u ) temporary 
store. 

c) Faults in the CPU that correctly evaluate 

aritlunotic and/or logical expressions for some 
operand pairs but not otlaers. In the example 
at hand such faults would include those that give 
invalid results fo- u for some values of y but 
not others. 

Next, in considering imp*oper program execution, we first 
note that in RAMP the instructions are fixed, i.e. each module 
is preprogrammed to repetitively execute a predctcrniined, exact 
sequence of instructions. Tlic instructions moreover fall into 
two categories: 

a) Tliose that arc data-sensjtive . I.e. the path tcikcn 
in the program flow is determined by the data being 
processed. ( Data- sensitive instructions are illustrated 
in Figure 11c.) 

b) Tliose that .are data-insrnsitivo (c.g. the proccdxire 
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C/'J-LS, and JUMP TO TOP instructions in 
Figure lib). 

Correspondingly, improper program execution can be 
traced to: 

a) Faults in the CPU that correctly execute data- sensitive 
instructions for some data values but not others. 

b) Faxilts in machine code store, instruction counters, 
address decoders, read/write stack, etc. 

Finally, excessive timing errors can be traced to faults 
in the timing reference (c.g. crystal), oscillator aniplifier 
failures, etc. 

Tlie discussion of the example is summarized in Figure 12 
wliich the authors postulate embraces all the possible ways a 
RAMP module can fail. 

The objective of the preflight test therefore is to test for 
the faults depicted at tlie base of the Figure. Since the prefiight 
test is initiated immediately after system start-up (Section VIII), the 
S’ stem udll initially "appear” synchronized. (For example, 
if the clock frequencies of all the microcomputers arc witkin 
a readily achic^vablc .001% of one another, a system with a 50. rns 
conipxitation frame rate will "appear" to be synchronized for 
some 80 minutes after system startup.) Hence, the preflight 
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test will be carried out during this period of apparent 
synchronization. 

In the current worV. .he plan is to carry out the preflight 
test for each module in two steps; 

1) A self-tost of each module would be initially performed 
to: 

a) test the CPU for faults associated with 
evaluation of aritlimetic and logical operations 
and tlie testing for faults in data sensitive 
instructions. 

b) test the rcad/writo store. 

Tliis test would be carried exit concurrently by all the 
microcompute rs . 

2) An iinnit/oxitput tost . To perform tliis test, the 
RAMP structure would bo supplemented with two 
microcomputer modules, one that inputs test patterns 
to the sensor /command irucrocompute rs and a second 
to monitor outputs of the servo microcomputers. 

Given tlicse additional nxicrocomputers , and 
referring back to the basic structure of RAMP as 
depicted in Figure 3, the network structure under 
test woxild have each set of redundant microcompxitc rs 

This latter computer woxild also be ejnployed for in-flight 
fault identification. 
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receiving test inputs from at least one microcomputer 
and each set would likewise be monitored by at least 
one microcomputer. In executing the input/output 
test, each imcr ©computer in a redundant set would 
accordingly: 

a) Generate checkwords signalling success or 
failure of the self-test. 

b) Receive and verify input test patterns validating 
data input paths. 

c) Transnut all the contents of its constant store. 

d) Receive a seqvience of data inputs that test all 
paths in the instrxiction code and transmit the 
results to a monitoring computer (i.c. to the 
next set of redundant computers or the micro- 
computer monitoring the servo microcomputers' 
outputs). 

e) Transnut its value of time. 

f) Transmit test patterns to check the downstream 
computers inputs. 

g) Generate data patterns that will exercise all 
the instruction paths in the downstream 
computers. 

Recalling that tlic system appears synchronized 
during tlus test, each microcomputer monitoring a 
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redundant set must receive identical data from each 


microcomputer in that set. Success of the preflisht 
test is accordingly acliieved by a "bit-by-bit” match 
of these data for the full duration of the test. 

Returning to the subject of coverage, the foregoing test 
covers all the favilts depicted in Figure 12. Given the valicfity 
of the postulate that these represent all possible faults, the 
actual coverage that can be attained by this testing procedure is 
gove rne d by: 

1) The coverage that can be achieved in the CPU and 
rcad/write store self-tests. 

2) The probability of compensating faults, e.g. a 
failed CPU test routine that signals a "success" 
checkword. 

Tile foregoing preflight test approach is currently being 
investigated with the current laboratory version of RAMP 
(Section VIII). Results will be presented in tlie future. 











Vn. INTERMITTENT FAULT TOLERANCE 

In Section V it was noted Uiat faults can be classified 
as being either permanent or intermittent. 

Of these two types of faults, the permanent faults are 
best understood! the bulk of the available electronic component 
failure rate data and failure mechanism models are based on 
permanent faults, they arc more readily uncovered by testing, 
and their effects on system performance easily determined. 

Intermittent faults on the other hand are less understood yet 
in the 'teal world" application account for the majority of failures 
in computer systems (some 80% to 90% as estimated in Reference 12 ), 

For the purposes of the discussion that follows, these 
intermittent faults can be inclusively classified as being either 
permanent or transient. In the context of the microcomputer 
systems of the type that would be employed in RAMP such 
permanent faults arc those in wliirh an intermittent fault has 
altered normal program flow to the extent that the microcomputer 
enters an infinite loop, enters a halt state, etc. 1, c. , the micro- 
computer system is "crashed". With the transient faults however, 
normal program flow is resumed after disappearance of the 
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intermittent fault but usually Vr-ith the consequence that the data 
being processed have been altered. These two consequences 
of intermittent faults are depicted in Figure 13 which also shows 
how these relate to the faults efiscussed in the previous Section. 

Note in the figure that intermittent timing faults result 
in transient system faults. This particular type of fault has 
been indirectly discxissed in Section IV» i.o., by virtue of 
use of am asymptotically stable control law, the errors introduced 
by an intermittent timing faxxlt will, in time, converge to zero. 
Now it should be clear from Figure 13 that this will also be the 
case for all the other intermittent faxilts that produce trainsient 
system fa\Uts. I.e. the RAMP microcomputer modvile is 
inherently fault tolerant to these types of intermittent faults. 

Permanent system faults, i.e., those resulting from 

broken p -ogram flow, can be accommodated by the simple 

St- 

expedient of using external hardware to detect a loss in program 
flow and force the iTiicrocomputer into a restart or retry 
{Reference 12) operation. This can be done in a (probably 
endless) variety of ways. Hence, the following describes one 
such approach mainly to illustrate the simplicity and effectiveness 
of the basic idea. 

As -will be seen, such external hardware is already available 
in current microcomputer components. 
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First, note that to initiate operation of a microcomputer, 
the device is normally supplied with an external reset signal 

I 

(or pulse) that results in the start of the computer program at 
some predesignated location (i»e. address). ISnee each RAMP 
module will be expected to perform different functions (e.g. the 
preflight test, processing of the flight algorithm, etc.) the 
module will receive (i.c. have deposited in buffer memory) 
not only data, but a command word that when fetched will 
indicate the specific function the module should be performing 
at that time. Hence, to start system operation, a RAMP module 
will be given a hardware reset and its program will "look" at 
the command word (which frliowng the previous section, will 
signal the module to begin the preflight test). At the completion 
of the preflight test, the microcomputer module would receive a 
different command word to signal start of processing of the control 
algorithm. 

Given this, a retry or restart due to broken program flow 
can be achieved using the hardware strxicture showTi in Figure 14a. 
The idea here is to supplement the progi'am employed in the 
RAMP module with oxitput instructions such that under normal 
program flow, a pulse is output during each computation frame. 
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I.e. in the unfailcd condition* the microcomputer will generate 

a pulse train of period equal to the computation frame period 

Tj. Each of these pulses in turn will retrigger a monostahlc 

multivibrator as shown in the figure. As long as this pulse 

train is maintained, the multivibrator will output a constant 

level (OFF) as shown in Figure 14b. In tlie event of broken 

program flow, it can be expected that this pulse train will cease. 

Now the monostable m\ilti vibrator is adjusted such that at a time 

T (T,> T.) after the last pulse (Figure 14b) its output level 
2 ** 1 

shifts (ON) as shown. This output in turn is \ised to gate an 
astablc multivibrator (Figure I4a) which in this gated condition 
generates a train of pulses having period (^T^) as shown in 
Figure I4b. The function of each of these pulses is to reset the 
microcomputer such that it begins processing at the reset address, 
fetches the command word and recognizing that it should be 
executing the control algorithm begins processing the algorithm 
from some predesignated initial state (c.g. u = 0 in Equation 6-1). 
Successful resumption of computation restores the pulse train 
driving the monos tabic multivibrator causing its output to return 
to the OFF level. The astablc multivibrator in turn is gated off 


as shown. 


Finally, it can be expected that the algorithm will initially 
be computing incorrect states following the retry. However, 
based on its convergence properties, it will, in time, reach the 
correct state. Hence, the above reset circuitry plus the 
convergence properties of the control law provide a RAMP 
module with the inherent high tolerance to intermittent faults 
enjoyed by most analog systems. 


I.e. , as a result of the intermittent faiKire the module will 
be deselected and hence be running "open" loop. 
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Vm. LABORATORY IMPLEMENTATION OF RAMP 


In order to verify its underlying concepts, RA*MP has 
been implemented in the laboratory as shown in Figure 15 . 

Here, as shown a pair of redxmdant microcomputer triplets 
are used to generate trajectory commands and provide tra- 
jectory and attitude control of a rotorcraft plant mechanized 
on an analog computer. (Only the pitch plane has been 
mechanized in the laboratory; a full, six axis version of RAMP 
is currently under development.) 

The basic microcomputer module employed in the 
laboratory is shown in Figure l6 along with its basic specifications. 

This structure was selected by the authors as being representative 

% 

of what in the 1985-1990 design period may be available on a 
single cliip or, at most, a three clup family (e.g. see Reference 16). 
Each of the modules are implemented on a single card, the six 
modules interconnected on a wired backplane. 

In the laboratory, the basic microcomputer module of 
Figure l6 is employed as a sensor microcomputer (the first set 


The laboratory version employs a l6-bit architecture with 
NMOS technology. Future, single chip versions may be 
available in 32-bit architectures \vith faster (e.g. SOS) 
technology. S(.*e Reference 13 . 
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of triplets in Figure 15 ) and a servo computer (the second set 
of triplets). The structure of these modules is depicted in 
Figure 17 and their operation explained as follows; 


1) Each of tile three sensor modules (Figure 17 a) 
selects (analog) outputs from the ro^treraft plant 
and converts these to digital representation. 
(Non-redundant sensor inputs have been used to 
date.) The computer then generates trajectory 
commandjj (range and velocity) and executes a 
trajectory control algorithm based on the errors 
between these commands and the actual aircr.xft 
states. The outputs of the trajectory control 
algorithm are attitude control commands which, 
along with the measured attitvide control states 
(pitch attitude and pitch rate) are written to the 
buffer memories of the servo modules. (Refer 
again to Figure 3 for the structure of inter- 
connect between modules transmitting and receiving 
buffered data. ) In addition to those comiviand and 
data values, a "data ready" sigiial is transmitted 
to indicate to the receiving (servo) niicroeomputer 
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that the memory has been written- The purpose 
of this signal is to circumvent the conflict of the 
receiving microcomputer trying to read a memory 
simultaneously being written by a transmitting 
computer. 

2) Each servo computer in turn reads the contents 

v' 'ts buffer memories, performs the midvalue 
select, threshold detection, executes the attitude 
control algorithm, and generates aui analog control 
(Figure l7b ). Control outputs feed a model of a 
"fly-by-wire" actuator (Figure 15 ) which in turn 
outputs a single control to the aircraft plant. 

The above implementation of RAMP has verified that the 
basic concepts behind RAMP can be realized in practice 
specifically: 

1) Minimum Complexity Realization 

One guage of complexity in a fault-tolerant 
computational system is the amount of "overhead" 
that is required to provide fault tolerance. In terms 
of computational resources, one is concerned principally 
with the demands on available program memory and 
execution time. Figure 18 shows the fractions of 
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progi*an\ memory and execution time that constitute 
the overhead of the current laboratory verpion of 
RAMP as would be applied in the six- degree -of- 
freedom application. (In the figure "FAULT” 
refers to the midvalue-sclect,threshold**dctect 
code, "COMM" to the code for writing and reading 
buffer memories, "OTHER" referring to the modules 
mainline program, initialization procedures, etc. 
"AVAILABLE" refers to that portion of the resource 
which cun be used for the modules intended 


application. ) 

Note that the pre-flight test is not included 
in the figure (the current version being evaluated 
employs approximately . 5K of program memory 
and reqxiires approximately 4 nanutes to execute). 

2) Asymhroni.sm 

Clock rates can and have been varied in the 
laboratory to demonstrate the convergence properties 
of the asymptotically stable control algorithm.s. 

3 ) Mid- Value Select; Threshold Detectio n 

The mid-value select strategy has, to date, 
been employed in the laboratory simulations with 
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no effects on system stability or accuracy. The 
coverage of the thresholtl detection process is still 
under investigation and will be reported in the 
future. 

4) Intermittent Fault Tolerance 

The reset circuit n\cthod described in Section VII 
has been implemented and tested succcssfiilly in the 
laboratory. 
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IX. 


CONCLUSION 


As explained earlier i the RAMP is targeted to the 
VLSI niicrocompittcr components that arc projected to be 
commercially available in the late 1980's. I.e.i implementation 
of RAMP using present-day components is not practical chiefly 
due to the large volumes required for packaging. Correspon<£.ngly, 
the reliability levels that can be acLv-ved by RAMP will depend 
upon the reliabilities of these future components. LSI semi- 
conductor failure rates (on a per -gate basis) hav'c continued 
to decrease with improvements in screening and processing 
tcclinology. It is the indvistry's goal to continue this trend, 
in the face of the new and forcefvil challenges which VLSI 
technology will present (Reference 14). Hence, the viability 
of liAMP as a n\eans of realizing ultra reliable avionics 
systems hinges on those future dev'elopmcnts . 

The concepts underlying RAMP, in particvilar the use of 
autonomous, asynchronous, intermittent-fault tolerant modules 
for control has broad, imn\ediate application. Correspondingly, 
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these concepts present several new research challenges for botli 
the systems theorist and cKpcrimcntalist in the general areas 
of distributed aspichronous microcomputer networks, testing 
and fault masldng and identification. 


APPENDIX A - EFFECT OF TIK^ING ERRORS IN PARALLEL 

ASYNCHRONOUS FLIGHT CONTROL COMl^UTERS 


As described in the main body of the text, it is the 
fxmetion of each microcomputer in the RAMP structure to 
mid-value select operands from the upstream set of redundant 
computers. To simplify the discussion of the effects of 
asynchronism consider the case of two parallel computers 
Ms and M £) employed in a control configuration as shown in 
Figure Al. As seen in the figure, Mg is the computer 
selected for the control, i-e., is operating in the closed loop. 
Computer Mjq is deselected and is hence operating open loop. 

As also explained earlier, each of a given set of parallel 
computers in the RAMP structure has n^ cache memories 
such that the nth cache memory in each computer in the set 
is simultaneously written by the nth xipstrcam compxiter. 

This process is represented in Figure Al by a sample -and »• 
hold function as shown (i.e., only the selected upstream 
computer is depicted). Computers M^^ and Mg in turn sample 
the held values as shown. Tliis sampling process is depicted 
in Figure A2 wherein the sami>lings by and Mg arc shown 

by impulse functions 1^(0 development that 
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follows, it is assumed that the selected computer M samples 

at the same rate that y(t) is sampled with each sample taken 

at a time immediately following the sampling of y(t). Computer 

samples at a higher rate as shown such that at times iT 

and (i + n)T computers Mg and Mj^ simultaneously sample 

the input. Between these times, computer samples an 

input valxie immediately preceeding that sampled by M . Note 

o 

that in the period nT, compxiter M^^ takes exactly one more 
sample (and hence one additional computation) than Mg. 

Now both computers will execute the same control law 
given by the following recurrence equation: 

u ]Ji + 1)t] = Au(iT) + % [(i + 1)t] (A1) 

In what follow's it is shown that Equation (Al) must be 
stable in order that the deselected computer not accunnilate 
unacceptable error.*? due to its asynchroni sm. Stability of 
Equation (Al) is however not required for stability of the 
closed loop system. The existence and form of the error is 
arrived at inductively in the following. 

At time iT, let the selected and deselected computers 
respectively compute control values Ug(iT) and U|^(iT). 
Moreover, assume that the deselected computer has an error 
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(iT) due to the asynchroni sm such that, 
u JiT) = u-(iT) + u (iT) (A2) 

jJ S> 

Now compute 

^ "" '"d 

as follows. 

First, note that in solving Eqiiation (Al), 

Mgjji+n)T] = A”ug(iT) + ^ ^ By [(k+UT^ 

k = i 

Through study of Figure (A20 it can be seen that 
Ujj jji + n)Tj= A'^'^^Uj^CiT) + A”By(iT) 

+ S A"^'^“^By[(k+ 1 )t3 (A4) 

k = i 

From (A2), 

u (iT) = u (iT) + u, (iT) 

JJ i> '*• 

Substituting this into Equation (A4) and sxibtracting 
Equation (A3) the error at time (i + n)T is obtained; 
u^ ]\i + ii)tJ = a”’*‘\» (iT) + a”( A - I> Ug(iT) 

+ A^BydT) (A5) 

The assiimption that u (iT) is an error due to as^mchroni sm 
is correct since at t = 0, u^(0)^(i.e. the first sample is 
taken simultaneously by the computers) and at time t = T, 


(A3) 
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U . (T) arises solely from the asynchronism and is in general 
non-zero. 

Equation (A5) may be written in final form: 
u , Hj + l)nT.l = (jnT) + a" [A - ij u^(jnT) 

+ A^ByOnT) 

Now for (A 6) to be stable matrix a”"^^ must have eigenvalues 
A such that, 

I 1 1 ^ ^ (A7) 


Correspondingly, for Equation (Al) to be 
must have eigenvalues Xy_ subject to the 


But 



stable matrix A 
same condition. 


i. c. , 

I i 

I A.; I = 

Hence, to guarantee convcrgo n cc of the error in the 
deselected computer duo to asynchronis m, it i s necessa ry, 
that the control law being compu t ed be asymptotically staMc _ . 

Note f\irther that when n is large (as is the case in 
practical applications) Equations (Al) and (A6) have the same 
natural response. E.g. at time t = mnT (m an integer) 

Ug(mnT) = A Ug(0). 
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Correspondingly, 

(mnT) = ^ (0) 

.V 

= u (mnT). 

S 

Hence, the rate of convergence of the asynchronism 
error introduced into the deselected computer is the same 
as that of the control law being computed . 

Finally, looking at the last two terms on the right of 
Equation (A6), the magnitude of the error in the deselected 
computer can be reduced by one or both of the following: 

1) Increasing the rate of convergence in the control 
law. 

2) Increasing n(i.e. reducing the timing error). 
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APPENCtX B 


MID- VALUE SELECT FOR FAULT MASKING 


Section V described the strategy of selecting the mid- value 
of a set of N = (2n + 1) data values as a means of masking 
faults in the RAMP network. The following paragraphs 
explain in greater detail the justification for this approach. 

Recall that the effect of the timing errors in the set of 
N computers is to disperse the data values over some range 
2 €: about that value which would be obtained with no timing 
error. This is illustrated in Figure B1 in which each of the 
N values are classified as being correct or incorrect as follows. 
A correct value lies within - ^ of the value that would be 
obtained w'ith no timing error; all other values are incorrect . 
Given no faults (i.e. timing error only) all values would be 
correct. On the other hand a data value generated by a 
faulted computer could be in general cither correct or incorrect. 

It can now be argued that given up to n failures in the N 
computers, the mid-vaiue will always be correct: If the mid- 

value originated from a non-failed computer, it is obviously 
correct. If on the other hand it originated from a faulted 
computer it will still be correct. The reason for this is that 
the mid-vahje by definition has e.xactly n distinct values greater 
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than or equal to it and exactly n distinct values less than or 
equal to it. Hence, if the mid-value is generated from a 
faulted computer there must remain up to n - 1 values 
originating from faulted computers. The mid-value must 
accordingly lie between two correct v?lues and therefore be 
correct. 

This argument also shows why N must be odd since in 
using an even number N = 2n of computers only (n - 1) 
failures can be tolerated, the same condition that would 
correspond to use of an odd number, N - 1 of computers. 

I. c. » the use of an even number of computers is uneconomic. 


Given an even number of computers there is of cour.s<> no 
mifl-val\je. One wo'.ild instead se]<>cl the me dian value of 
the two innermost val\u-s on the real line. 
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(b) Example Program 
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